# Video Content Analysis

NABLA VL
Apache-2.0
A Japanese Vision-Language Model (VLM) developed by NABLAS, supporting image, multi-image, and video inputs, suitable for various multimodal tasks.
Image-to-Text Transformers Japanese
N
nablasinc
1,673
2
Video R1 7B
Apache-2.0
Video-R1-7B is a multimodal large language model optimized based on Qwen2.5-VL-7B-Instruct, focusing on video reasoning tasks, capable of understanding video content and answering related questions.
Video-to-Text Transformers English
V
Video-R1
2,129
9
Youtube Xlm Roberta Base Sentiment Multilingual
Fine-tuned YouTube comment sentiment analysis model based on cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual, with an accuracy of 80.17%
Text Classification
Y
AmaanP314
91
1
Ola Video
Apache-2.0
Ola-7B is a multi-modal language model jointly developed by Tencent, Tsinghua University, and Nanyang Technological University. Based on the Qwen2.5 architecture, it supports text, image, video, and audio inputs, with text content as output.
Safetensors Supports Multiple Languages
O
THUdyh
82
1
Smolvlm2 256M Video Instruct
Apache-2.0
SmolVLM2-256M-Video is a lightweight multimodal model specifically designed for analyzing video content, capable of processing video, image, and text inputs to generate text outputs.
Image-to-Text Transformers English
S
HuggingFaceTB
22.16k
53
Eagle2 9B
Eagle2 is a high-performance series of vision-language models focused on enhancing model performance through optimized data strategies and training methods. Eagle2-9B is the large model in this series, achieving a good balance between performance and inference speed.
Text-to-Image Transformers Other
E
KnutJaegersberg
15
4
Internvl 2 5 HiCo R64
Apache-2.0
A video multimodal large language model enhanced by Long and Rich Context (LRC) modeling, improving existing MLLMs by enhancing the perception of fine-grained details and capturing long-term temporal structures
Video-to-Text Transformers English
I
OpenGVLab
252
2
Videomae Large Finetuned Deepfake Subset
A fine-tuned version based on MCG-NJU/videomae-large model on the deepfake detection challenge dataset, used for video deepfake detection.
Video Processing Transformers
V
shylhy
519
0
Mplug Owl3 2B 241014
Apache-2.0
mPLUG-Owl3 is an advanced multimodal large language model focused on addressing the challenges of long image sequence understanding, significantly improving processing speed and sequence length through the Hyper Attention mechanism.
Text-to-Image English
M
mPLUG
2,680
6
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase